Kenneth Tay
Oct 10, 2019
dplyr
select()
: pick variables/columns by their namesmutate()
: create new variables/columns based on existing onesarrange()
: reorder rowsfilter()
: pick rows by their valuessummarize()
: collapse many rows down to a single summarygroup_by()
: perform operations at a group levelALL of these functions take:
ALL of these functions take:
The dataset is either:
ALL of these functions take:
The dataset is either:
%>%
, e.g.ALL of these functions return a dataset!
You can do three things with this returned dataset:
%>%
%>%
syntax with dplyr
Take the mtcars
dataset, select just the wt
and mpg
columns, then select rows with mpg < 15
%>%
syntax with dplyr
+
syntax with ggplot2
tidyr
package: gather()
and separate()
A function is a named block of code which
We’ve already seen a number of functions in R! For example,
## [1] TRUE
The function is.character
takes the input given to it in the parentheses and returns TRUE
or FALSE
, depending on whether the input is of type character or not.
Others we’ve seen: str()
, head()
, rm()
, ggplot()
, select()
, …
We can see what a function does by typing in ?
followed by the function name in the R console.
The most important syntax in R is the function call. All R syntax has function calls underlying it.
A function call consists of:
## [1] NA
## [1] -1
abs(x)
: If x
is positive, return x
. If x
is negative, return x
without the negative sign.
## [1] 2.6
abs(x)
: If x
is positive, return x
. If x
is negative, return x
without the negative sign.
## [1] 2.6
%>%
%>%
is implemented by the magrittr
packagedplyr
package is loaded, magrittr
is loaded too%>%
is “syntactic sugar”: makes code easier to understand%>%
becomes the first argument in the function on the right of %>%
## [1] 2.6
%>%
syntax with dplyr
Take the mtcars
dataset, select just the wt
and mpg
columns, then select rows with mpg < 15
+
syntax with ggplot2
+
for ggplot2
only?Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
A (probably) better answer: Documentation in R itself!
sample()
: Descriptionsample()
: UsageWhat comes after the =
sign: default value for that argument
sample()
: Argumentssample()
: Detailssample()
: Value## [1] 8 7 2 3 1 5 9 10 4 6
## [1] 10 8 9 3 1 5 6 4 2 7
## [1] 9 2 1 10 10 7 10 7 8 3
## [1] 6 2 5 9 4 1 8 7 10 3
## [1] 10 6 4 1 7 3 5 4 6 1
## [1] 3 8 7 6 2
tidyr::gather()
E.g. dataset of no. of cases for each country
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
Probably want something like
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Problem: Column names are values of the variable year
.
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset:
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr
’s gather()
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr
’s gather()
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()
How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr
’s gather()
tidyr::separate()
E.g. dataset of rate (cases / population) for each country
## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()
How to get cases and population into columns of their own?
## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()
How to get cases and population into columns of their own?
Solution: Use tidyr
’s separate()
tidyr::separate()
How to get cases and population into columns of their own?
Solution: Use tidyr
’s separate()
## # A tibble: 6 x 4
## country year cases population
## <chr> <dbl> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
None
to D4
: drought levels of increasing severity
Optional material
tidyr
functions: gather
and spread
gather
: Used when some column names are not variables, but values of a variable
spread
: Opposite of gather
tidyr
functions: separate
and unite
separate
: Used to separate values in one column into multiple columns
unite
: Opposite of separate